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1. Introduction. Over the last two decades, many computationally in¬ 
tensive nonparametric techniques and theories have been boldly developed 
to exploit possible hidden structures and to reduce modeling biases of tradi¬ 
tional parametric methods. Methods such as local polynomial fitting, spline 
approximations and orthogonal series expansions as well as dimensionality 
reduction techniques have been studied in great depth in various statistical 
contexts. Yet there are no generally applicable methods available for the 
inferences in nonparametric models. Various efforts have been made in the 
literature on nonparametric hypothesis testing. See, for example, Bickel and 
Ritov (1992), Eubank and Hart (1992), Hardle and Mammen (1993), Azza- 
lini and Bowman (1993), Fan (1996), Fan and Li (1996), Spokoiny (1996), 
Inglot and Ledwina (1996), Kallenberg and Ledwina (1997) and Horowitz 
and Spokoiny (2001, 2002), among others. For an overview, see the recent 
book by Hart (1997). Adaptive minimax rate results are obtained by vari¬ 
ous authors, including Fan (1996), Spokoiny (1996), Horowitz and Spokoiny 
(2001), Fan and Huang (2001) and Fan, Zhang and Zhang (2001). However, 
most of the studies focus only on the one-dimensional nonparametric regres¬ 
sion problem. They are difficult to extend to multivariate semiparametric 
and nonparametric models. 

In an effort to derive a generally applicable testing procedure for multi¬ 
variate semiparametric and nonparametric models, Fan, Zhang and Zhang 
(2001) proposed generalized likelihood ratio tests. The work is motivated 
by the fact that the nonparametric maximum likelihood ratio test may not 
exist in many nonparametric problems. Further, even if it exists, it is not 
optimal even in the simplest nonparametric regression setting. Generalized 
likelihood ratio statistics, obtained by replacing unknown functions by rea¬ 
sonable nonparametric estimators, rather than the MLE as in the parametric 
setting, have several nice properties. In the varying coefficient model 

(1.1) Y = ai(U)Xi + • —f- a p (U)X p + e, 

where (U,X i,... ,X p ) are independent variables and Y is the response vari¬ 
able, Fan, Zhang and Zhang (2001) unveil the following Wilks phenomenon: 
The asymptotic null distributions are independent of nuisance functions and 
follow a ^-distribution (in a generalized sense) for testing the homogeneity 

(1.2) H 0 :ai(-)=9i,...,a p (-) = 9 p 

and for testing the significance of variables, such as 

(1.3) H 0 :ai(-) = a 2 (-) =0. 

In other words, the generalized likelihood ratio statistic X n follows asymp¬ 
totically a rescaled ^-distribution i n the sense that (26 n ) _1 / 2 (r^A ri — b n ) 

N( 0,1) for a sequence b n —* oo and a constant tk- We will use the notation 
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f'K^n ~ Xb n to denote the result. The significance of the result is that the 
scale constant tk and the degrees of freedom b n are independent of nuisance 
parameters, such as the joint density of (U,X i,... ,X p ) and the parameters 
9\,... , 0 P in (1.2) and the functions as(-),..., a p (-) in (1.3). This Wilks phe¬ 
nomenon is the key to the success of the classical maximum likelihood ratio 
tests for parametric problems. With the above newly discovered Wilks phe¬ 
nomenon in nonparametric models, the E-values can easily be computed by 
using either the asymptotic distributions or simulations via fixing nuisance 
parameters or functions under the null hypothesis at certain values of inter¬ 
est. Furthermore, Fan, Zhang and Zhang (2001) showed that the resulting 
tests are asymptotically optimal in the sense of Ingster (1993). 

The idea of the above generalized likelihood method is widely applicable 
in semiparametric and nonparametric models. It is easy to use because of 
the Wilks phenomenon and is powerful as it achieves the optimal rate of 
convergence. Yet, one needs to specify the parametric form of the error 
distribution such as e in (1.1) in order to construct the generalized likelihood 
ratio statistic. While the procedure based on the normal likelihood may be 
still applicable to the case where the distribution of e is homoscedastic, it 
may not be efficient. When the error distribution is heteroscedastic with the 
variance var(e|17) = a 2 (U), the construction of the generalized likelihood 
ratio test statistic needs the knowledge of the variance function <r 2 (-). This 
motivates us to propose the sieve empirical likelihood ratio (SELR) test 
statistic for handling the case where the exact form of the error distribution 
is unknown, but some qualitative traits of the distribution are known. A 
popular model is to assume 

(1.4) E[G(e)\U]=0 

where G = (G\,.. ., Gk 0 ) T is a fco-dimensional function [see Owen (1990), 
Newey (1993) and Zhang and Gijbels (2003)]. This is a much less restrictive 
assumption than a parametric form on the distribution of e. In particular, 
when the conditional distribution of e given U is symmetric about 0, we may 
choose a sequence of ko grid points, say, 0 = so < si < ■ ■ • < Sk 0 and take 

(1.5) G fc (e)=/(e€ [s fc _i,Sfe])-/(-£€ [s fc _i,s fc ]), 1 < k < k 0 , 

or a smoother version of the function G where /(•) is the indicator func¬ 
tion. Note that as maxi<fc<fc 0 (sfc — s^-i) —> 0, A;o —> oo, these restrictions are 
essentially the same as the symmetric assumption on the distribution of e. 

A few questions related to the SELR test arise naturally. First of all, it 
is not clear how to construct an empirical likelihood in the nonparametric 
setting. Second, it is not obvious whether a particular construction of the 
empirical likelihood ratio statistic will follow the Wilks type of result. Third, 
it is not granted that the resulting test statistic is asymptotically optimal 
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in the sense of Ingster (1993). Finally, it remains unknown whether the 
empirical likelihood ratio statistics will adapt to the unknown distribution 
of e including heteroscedasticity. These issues are poorly understood and 
need to be studied. 

The technical derivations for SELR tests are very involved. To ease some 
of the technical burden, we choose the varying coefficient model (1.1) for our 
investigation. The model arises from various contexts and has been widely 
used. For example, in many biomedical studies one frequently encounters the 
issue of the extent to which the effect of exposure variables on the response 
variable changes with the level of a confounding covariate (e.g., age). See, for 
example, Cleveland, Grosse and Shyu (1991), Hastie and Tibshirani (1993) 
and Carroll, Ruppert and Welsh (1998). The model can also be used for 
predicting group behavior in economics where different groups are allowed 
to have different coefficients. In longitudinal studies, investigators often want 
to examine how the effects of covariates on response variables change over 
time [Brumback and Rice (1998) and Wu, Chiang and Hoover (1998)]. In 
nonlinear time series, the model allows different autoregressive models for 
different regimes of state variables [Chen and Tsay (1993) and Cai, Fan and 
Yao (2000)]. It includes the threshold autoregressive model [Tong (1990)] as 
a specific example. The model has successfully been applied by Hong and 
Lee (2003) to the inference and forecast of exchange rates. Thus, our study 
in model (1.1) has direct implications for the above problems. 

For the varying coefficient model (1.1), whether the coefficient functions 
are really varying or whether certain covariates are statistically significant 
frequently arises. This leads to the problem of testing for homogeneity (1.2) 
or the problem of testing for significance such as the problem (1.3). As will 
be explained at the end of Section 2, these problems can be reduced to that 
of testing against a specific null hypothesis: 

H 0 : aq(-) = oio( - )j • • • > %>(■) = ®po(')> 

for some given functions aio,..., a p q. Our approach is to first construct the 
local linear estimator of the coefficient functions a±,... ,a p via a local version 
of the empirical likelihood, and to then substitute the estimate into a spe¬ 
cial sieve empirical likelihood [see Zhang and Gijbels (2003) and Zhang and 
Liu (2003)]. This allows us to form the empirical likelihood ratio statistics. 
We will show that the proposed SELR procedures follow the Wilks type of 
results under more relaxed assumptions on the error distribution of e. This 
provides a useful extension of the results given by Fan, Zhang and Zhang 
(2001). Note that our procedure is very different from that of Kitamura 
(1997), who considered testing problems for finite-dimensional parameters 
in weakly dependent processes. He first used the local (blocking) approxima¬ 
tion to construct a global estimating equation, then applied Owen’s proce¬ 
dure directly. For the full nonparametric regression model, Chen, Hardle and 
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Li (2003) developed a very different version of the empirical likelihood ratio 
test, using a kernel-smoothed parametric estimator under the null hypothesis 
as ancillary information. The idea has nicely been extended to simultane¬ 
ously testing the parametric forms of the mean and variance functions by 
Chen, Gao and Li (2003). Horowitz and Spokoiny (2002) developed a differ¬ 
ent test for one special case of the model (1.4). The test shares most of nice 
features listed for the SELR test and includes an automatic selection of the 
smoothing parameter. It is not clear whether the Horowitz-Spokoiny test 
is adaptive to the error distribution under the alternative hypothesis, as is 
the SELR. Furthermore, because of the saturated alternative, the curse-of- 
dimensionality problem arises in implementation and power. 

Our empirical likelihood ratio method applies also to the nonparametric 
tests on density functions. As an illustration without introducing new sta¬ 
tistical setting, we regard the constraints (1.4) as a null hypothesis. We will 
demonstrate that the Wilks type of phenomenon continues to hold for this 
nonparametric testing problem. 

Our studies have implications for other nonparametric models. When 
p = 1 and X = 1, the model is the nonparametric regression model stud¬ 
ied by many authors. Our results can be directly applied to the problems of 
testing parametric models against the nonparametric alternative. Further, 
our theoretical results shed some lights on the validity of the Wilks phe¬ 
nomenon in other models such as additive models and models under certain 
mixing conditions. 

When p = 1 and X = 1 and the coefficient function ai(-) = 9 , under the 
constraints (1.4) and (1.5), the model becomes a one-sample symmetric loca¬ 
tion model, which was well studied, for instance, by Hettmansperger (1984) 
and Bickel, Klaassen, Ritov and Wellner (1993). In Section 2, we find that 
for this special case, the first step in our procedure essentially makes the 
information on the stochastic error to be efficiently used [Owen (1988) and 
Zhang and Liu (2003)]. Moreover, the second step makes the likelihood ra¬ 
tio statistic adaptive to heteroscedasticity. As a result, our procedure has 
two advantages over the parametric assumptions on the error distribution. 
First, it requires only some conditional estimating equations such as (1.4) 
rather than the whole distribution of the stochastic error. Second, the asymp¬ 
totic null distribution of the SELR statistic asymptotically follows a rescaled 
X 2 -distribution. The scaling constant and the degrees of freedom are inde¬ 
pendent of the conditional distribution of e even if the stochastic error is 
heteroscedastic in U. The procedure and results can be easily generalized to 
a more general constrained regression model in Zhang and Gijbels (2003). 

The paper is organized as follows. In Section 2 the sieve empirical like¬ 
lihood ratio statistics are introduced for testing the goodness-of-ht of the 
estimating equations and for testing some simple and composite null hy¬ 
potheses. In Section 3 the asymptotic null and nonnull distributions of these 


6 


J. FAN AND J. ZHANG 


statistics are derived. In Section 4 a simulation study is conducted to eval¬ 
uate the performance of the proposed procedure empirically. The technical 
conditions and the proofs are relegated to Section 5. The technical lemmas 
are given in the Appendix. 


2. Sieve empirical likelihood. It is more convenient to work with the 
matrix notation for model (1.1), 


(2.1) Y = A T (U)X + e, 

where Y is the response, U G C R 1 (with hi bounded) and X £ R p are 
covariates, e is the stochastic error and A(U) = (ai(u ),..., a p (u )) is the vec¬ 
tor of varying coefficients. Let {(Y), Xi, C/j)}” =1 be an i.i.d. random sam¬ 
ple from the model (2.1) with the restriction (1.4). According to Owen 
(1990), to construct an empirical likelihood which can identify an infinite¬ 
dimensional parameter such as A(u) in (2.1), we need to establish an in¬ 
finite number of unconditional estimating equations. Such a likelihood is 
often theoretically intractable. To overcome this difficulty, Zhang and Gij- 
bels (2003) proposed a general procedure to build a sieve empirical likeli¬ 
hood via local approximation. For the model (2.1) the procedure consists of 
two steps: First, for each Uj construct n local empirical likelihoods which 
can locally identify A(u),u ~ Uj. These local empirical likelihoods lead to a 
weighted approximation of the logarithm of the conditional probability mass 
dP(Y,x)\u=Uj{Yj,Xj). Then a log-likelihood is obtained simply by summing 
up all of these approximated logarithms. In the first step, we will use the lo¬ 
cal linear approximation of the nonparametric coefficient functions A(-) [see 
Fan (1992), Fan and Zhang (1999) and Cai, Fan and Li (2000)]. In other 
words, in a neighborhood around a given point uq, approximate A(-) by 

A(u) ~ A(uq) + A'(uq)(u — uq) for uq. 


Thus, around the point no, the model (2.1) and the restriction (1.4) can be 
written, respectively, as 


( 2 . 2 ) 


E 


YkP a (uv) T z[x,^-^ + 


G(Y — P a (uq)) t Z ( X, 


U -Uq 
h 


£ 

U = u 


for U ~ Uq, 
~ 0 for u 


uq, 


where /3a{u o) = (A T (uo),hA r (uq)) t and Z(X,t ) = {X T ,tX T ) T . This is in¬ 
deed a local linear model [Fan (1992)]. To incorporate the local linear model, 
let h represent the size of the local neighborhood where the approximation is 
valid and K be a weight function, which is a symmetric probability density 
function. Let pi,i = 1,... ,n, be the conditional empirical probability mass 
of ( X,Y) given U = uq, putting on the zth data point (Aj, Y)), i = 1,... ,n. 







SIEVE EMPIRICAL LIKELIHOOD RATIO 


7 


Suppose that given U, e and X are independent. Then the conditional con¬ 
straints (2.2) can be translated into the following unconditional estimating 
equation: 

n 

^2piG ih (u 0 , (3 a(u 0 )) = 0, 


i —1 


where 

G ih = 


G ih (u 0 ,/3) = g(Yi- p T z(Xi, 


ZiXi, 


Ui - U 0 
h 


with <S> being the Kronecker product, /3 = {A T , hB T ) T , A = (ai,..., a p ) T and 
B = (6i,..., b p ) T . To see why we need an extra factor Z(Xi, ( Ui — uo)/h ) in 
the unconditional estimating function Gj^, we let G(e) = e temporarily. It 
is a well-known fact that in the linear model the product of the residual 
and the covariates is a good estimating equation for the parameter (3a- This 
leads to the estimating equation: 


i= 1 




Ui - Up 
h 


z(xi 


Uj - Up 
h 


= 0 . 


In light of this fact, for a general G we should build the estimating equation 
by multiplying each component of G by the covariate vector Z(Xi,(Ui — 
uo)/h), which admits the form Gj/ t . 

Thus, following Owen (1988, 1990), the local empirical log-likelihood func¬ 
tion of /3 is defined by 


l(/3, uq) = sup £ w h (Ui,Uo)logpi :pi > 0,1 < i < n, 


(2.3) 


< %— 1 


J2Pi = YT,PiGih{u 0 ,f3) = 0\, 


i= 1 


i= 1 


where w h (Ui,u 0 ) = K h (Ui-u 0 ) / YZi=i K h(U m -uo) with K h {-) =K(-/h)/h. 
If we set pi = Wh(Ui,uo)qi, then (2.3) becomes 


l((3, uo) = sup £ w h (Ui,Uo)log{w h (Ui,Uo)qi} :qi >0,1 < i < n, 

U=1 

n n ' 

5 2 w h ( U i,u 0 )qi = l,'52w h (U i ,u 0 )q i Gi h (u 0 ,P) = 0 


i=1 


2=1 


Analogously to Owen (1990) and Qin and Lawless (1994), if 0 is contained in 
the convex hull of the points in {Gih(uo, /3): Wh(Ui,uo) > 0,1 < * < n}, then 
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an explicit expression can be derived by the Lagrange multiplier method as 
follows: 

n 

l((3,u 0 ) = J2 w h{Ui,u 0 )logw h (Ui,u 0 ) 

i—1 

n 

~ J2 w h{Ui, u 0 )log(l + al(u 0 ,(3)G ih (u 0 ,/3)), 

i— 1 

where a n (uo,/3 ) satisfies 


(2.4) ^2w h (Ui,u 0 ) 

i —1 

Define the estimate of by 


£*ih( u o, /?) 

l + a^(u 0 , /3)G ih (u 0 , P) 


= 0 . 


(2.5) 


$(uo) = argmax/(/3, no). 

13 


The first p components, denoted by A(uq), give an estimate of A(uq), and the 
remaining components estimate hA\uo ). Similarly to LeBlanc and Crowley 
(1995), an approximate empirical likelihood, called the sieve empirical like¬ 
lihood for the nonparametric function A can be introduced by adding the 
logarithm of the conditional likelihood at each data point: 

n 

i(a\g) = J2KPa,u j ). 

3=3 

The name “sieve” originates from the following two facts: {E[G(e)\U = 
Uj]}i<j< n is a sieve approximation to the constraints (1.4) and 1(Pa,Uj ) 
is a weighted approximation of the logarithm of the conditional probability 
mass dP(Y,x)\u=Uj (Xji Xj)- See Zhang and Gijbels (2003) for a more detailed 
explanation. Motivated by Fan, Zhang and Zhang (2001), we define the 
logarithm of the sieve empirical likelihood under the nonparametric model 
(2.1) with constraints (1.4) by substituting (3 = $ into l(A\G), leading to 

n 

mG)=Y J mu j ),u j } 

3 = 1 

with 0 denoting the space of A. 

We now consider the nonparametric test concerning the density function 
of e. As a specific application of the sieve empirical likelihood, we consider 
testing 


(2.6) 


H o g--E[G(s)\U]=0, 
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where G is given in (1.4). Without the constraint (1.4), following the above 
derivations, the corresponding logarithm of the sieve empirical likelihood is 

n n 

i(ei")=EE w h (Ui, Uj ) \ogw h (Ui, Uj). 

3 =1 i =1 

Thus, we can construct a goodness-of-fit test of hypothesis (2.6) based on 
the following logarithm of the SELR: 

i(G) = -i(e\G) + i(e\N), 

n n 

(2 - 7) = ££ w h (Ui, Uj) log(l + a(Uj) T G ih (UjJ)) 

j =i *=i 

where a(u) = a n (u,(3). 

Next, we consider the sieve likelihood ratio test for the nonparametric 
coefficient function H(-) under the restriction (1.4). In the varying coeffi¬ 
cient model (2.1), we ask naturally whether the coefficient is really varying 
or whether certain covariates are statistically significant. This leads to the 
parametric null hypothesis: 

H 0p :A(-) = 9. 

More generally, we wish to test the composite null hypothesis, which involves 
nuisance functions ^2(‘) : 

(2.8) Hq u '-Ai = Aio -4=>- Hi u :Ai^Aio 

with ^2(0 completely unknown. This problem includes the test of signifi¬ 
cance (1.3) under model (1.1) as a specific example. Here we write 

with A\q(u) and A\(u) being p\ (< p) -dimensional. To construct the likeli¬ 
hood ratio statistic for Hq u , we introduce the following notation: 

P2a(uq) = (A^uq), hA'y(uo)), /?2 = (A^jhB^Y , 

f3 = (Hx 0 (uo), A 2 , hA'{ 0 (uo), hB ^) . 

Let 

/32(^o) = (A- 2 ,hB^) T = argma xZ(/T,ii 0 ), 

ft 

$*( u o) = (A[ 0 (uo), A 2 , hA'{ 0 (uo), hB^) 
and the corresponding a*(uo ) be implicitly defined by 
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Then the SELR statistic for Hq u can be written as 
(2.9) l(H 0u \G) = -l(@02\G) + l(@\G), 

where ©02 denotes the space of A 2 and 

n 

Z(0o 2 |G) = ^Z(r(C/ j ),C/ j ). 
j = 1 

The SELR test for the semiparametric model that A(-) has a certain 
parametric form such as the linear model can be constructed analogously. 
As in Fan, Zhang and Zhang (2001), the asymptotic null distributions of the 
SELR statistics for composite null hypotheses can be derived from those for 
simple hypotheses (see the next paragraph). This motivates us to consider 


( 2 . 10 ) 


Hq s : A — Aq 


His -A^Aq 


for a given Ao- Analogously to 1(Hq u \G), we can construct the following 
likelihood ratio statistic: 


( 2 . 11 ) 


l(H 0s \G) = -l(A 0 \G) + l(Q\G) 

n n 

= Wh ( Ui > Uj ) lo g(! 

j =1 i =1 


■aniUjtPoYGihiU^PoV-HG) 


where /3q denotes /3a 0 ■ Note that when Ao in Ho s is known, we can assume, 
without loss of generality, that Ao = 0. This can be accomplished through 
a simple transformation A* = A — Ao- With this transformation, (2.10) is 
equivalent to 


( 2 . 12 ) 


H'o s : A* = 0 


H[ S :A*^ 0. 


This specific problem has an advantage: the local linear estimator under 
the null hypothesis is unbiased and hence the null distribution can be more 
accurately approximated. 

We opt for general Ao, since the results have implications for the com¬ 
posite null hypotheses. To appreciate this, consider the composite null hy¬ 
pothesis testing problem: 


(2.13) 


H 0 :AeAo 


A$Aq, 


where Ao is a set of functions. Let l(Ao\G) be the sieve empirical likelihood 
under the hypothesis Hq in (2.13). Then, the SELR statistic is simply 

X n = -l(A 0 \G) + l(@\G). 

Let Aq denote the true value of the parameter function A. Consider the 
fabricated testing problems with the simple null hypotheses: 


(2.14) 


Hq : A = Aq 


Hi-.aaK 
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and 

(2.15) Hq: A = A' 0 « H[:AeA 0 . 

Let 1(Aq\G) be the sieve empirical likelihood under Hq. Then the SELR 
statistic for (2.13) can be written as 

A n = AK|G)-A*K|G), 

where A(-Ag|G) = —l(A' 0 \G) + /(0|G) is the SELR statistic for the problem 

(2.14) and A*(Aq|G) = — l(A' 0 \G) + l(Ao\G) is the SELR test for the problem 

(2.15) . Thus, the asymptotic representation of A n follows directly from those 
of A(^4q) and A*(Aq), which admits the form given by (2.11). 

3. Asymptotic theory. 


3.1. Asymptotic expansions. In order to obtain the properties of the 
SELR statistics in (2.7) and (2.11), we first develop some uniform asymp¬ 
totic representations for the local sieve empirical likelihood estimator (3 and 
the Lagrange multiplier a in (2.4) and (2.5). These results are the general¬ 
izations of Zhang and Liu (2003). They also indicate the performance of the 
sieve empirical likelihood estimator. Using these results we will establish the 
asymptotic representations for 1(G) and 1(Hq s \G) in (2.7) and (2.11). For 
simplicity of presentation, we assume G is differentiable. Let f(uo ) be the 
density of U at the point uq. Set 


D(u 0 ) = -E 


d G(e) 


de 


U = u 0 


V(u 0 ) = E[G(e)G T (e)\U = u 0 \, 


T(u 0 ) = E(XX T \U = u 0 ]f(u 0 ), 


S 


(l 

0 A 

f 

(o 

H 2 ) ’ 

h 2 = J 


t 2 K(t) dt, 


Vi(u 0 ) = ~{D(u 0 ) T V(u 0 ) ^(uo)} 1 D T (u 0 )V 1 (u 0 )G(£ i ), 

c(u 0 ) = y-^uo) - u- 1 ^)^^)^" 1 ^)^^))" 1 ^^)^^)^- 1 ^), 


£i = Yi- A T (Ui)Xi. 


Theorem 1. Suppose that conditions (K0), (U0), (A1)-(A10) and (Bl)- 
(B5) in Section 5.1 hold and that the underlying A(u) have twice continuous 
derivatives and satisfy condition (B6). If there exist some positive constants 
bo, bi and rj <1/2 such that bo < hn 71 < b\, then uniformly for uq G 1L 

P(u 0 ) - P(uq) + -jjK h (Ui - no ) ( U q)x%-- u 0 )/h) 
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a(u 0 ) 


x 77iOo)(l + o p (h 1/2 )) + O p (h 2 ), 


-J2 

n r-f 

1=1 


K h (Ui - «o){C'(«o)C?(e i )} 




(1 + o p (/i 1/2 )) + O p (h 2 ). 


As a consequence of Theorem 1, we have the following asymptotic uniform 
expansion: 

1 n 

A(u 0 ) - A(u 0 ) = -J2 K h (Ui - M 0 )r^ 1 (rto)Xjr?j(u 0 )(l + o p (h 1/2 )) + O p (h 2 ). 

Tl . 
i=l 


The asymptotic normality of the local sieve empirical likelihood estimator 
follows easily from the above asymptotic expansion. 

In Theorem 1, the requirement that G is differentiable can be relaxed by 
imposing some entropy conditions on G and by assuming E[G(e — t)\U = 
rto] is twice continuously differentiable in t. In this case D{uq) should be 
replaced by — {dE[G(e — t)\U = uo]/dt}\ t= Q. Similarly to Zhang and Liu 
(2003), we can show that the asymptotic efficiency of A(uo) is increasing 
in D{uq) t V{uq)" 1 x D{uq). In particular, in the setting of the symmetric lo¬ 
cation model mentioned in Section 1, we can find a sequence of G functions, 
say {G^}, such that the corresponding A(uo) is asymptotically adaptive to 
the unknown conditional density of e given U = uq. In practice, to save com¬ 
putational effort, we prefer to choose a G with a small k o and a relatively 
large D(u 0 )V(uq)~ 1 D(uq). 

It should be noted that under the conditions of Theorem 1, for (3 near 
its true value, a(uo,f3) is uniquely determined by the estimating equations. 
Thus, the number of unknown parameters is 2 p for each uq. It is well known 
that to make the local linear model regular, the interval [tto — h, uo + h] should 
include at least 2p + 1 data points of U. This condition asymptotically holds 
under the condition of Theorem 1 because as n—> oo, 


P< | ^2 I( u 0 — h < Ui < uq + h) > 2p + 1 
L=i 

= P\ - h < Ui < Uq + h) 


1=1 


— EI{uq — h<U < Uq + h)\ 


+ nEI (uq — h<U <uo + h) 


>2p + l 


> P{nEI (uq — h <U < uq + h) >2p + 1 + 5} 
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— uE(I(uq — h <U <uq + h) — EI(uq — h<U <uq + h)) 2 /5 2 
-*• 1 

where 5 = n^ l+2r, ^ 2 h and /(•) is the indicator function. We can further show 
that this condition actually holds uniformly in uq by an approach using 
empirical processes. 

We now give the asymptotic representations for the SELR statistics 1(G) 
and 1(Hq s \G). The results indicate that they admit a generalized quadratic 
form. To facilitate the expressions, the following notation is introduced. Let 

(t> ikh (U) = K h (Ui - U)K h (U k - U)C(U) 

x (1 + (Ui - U)(U k - 17)/X2 1 /i" 2 )X[r“ 1 (?7)X fc /" 1 ([/), 

(3.1) K*(s) = j K(t)K(s + i)(l + t(s + t)^ 2 1 ) dt, 

®ikh = E[cf> ikh (U)\(Ui, U k ,Xi,X k )\ 

(3.2) =K* h (U k -U i )C(U i )XJT~ 1 (U i )X k (l + O v (h )), 

Tn = ( £ i)^ikhG(e k )- 

^ ' i^k 

Similarly, we define 

q ikh (U) = K h (Ui - U)K h (U k - U)V- l (U)XTY~ l (U) 

x X k {l + (Ui - U)(U k - U)n 2 l hT 2 }f~ l (U), 

Qikh — E(qi kk (U) \ (Ui , U k: Xi , -X)j)], 

T* = — -~y ^G T (Ei)(Q ik h - ^ikh)G(s k ). 

n(n — 1) rrf 

Then we have the following result. 

Theorem 2. Suppose the conditions of Theorem 1 hold. Then under 
Hog, 

(3.3) 2, < G) = {k ° ~h m / ^ 2 (‘)(1 +«V)‘# 

+ (1 + Op(h}l 2 )) n T n + o p (h 1 ^ 2 ); 
and under Hq s , if Aq is linear or nh 9//2 —> 0, then 

(3 4) 2l(H 0s \G) = jK 2 (t)(l + t 2 ^ l )dt 

+ (1 + °p(^ 1 ^ 2 )) n ^n + °p(h 1 ^ 2 ), 
where |fl| is the length of the support Yl of the density f. 
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Note that if there are no components in A, then under Hqg the factor 
ko — 1 in (3.3) should be ko, since it costs p degrees of freedom to estimate 
them when there are p components in A. 

3.2. Asymptotic null distribution. With the asymptotic representations, 
we are now ready to derive the asymptotic distributions of the test statistics 
Ig and 1(Hq s \G). As in the parametric case for the stochastic error e [see Fan, 
Zhang and Zhang (2001)], under the null hypotheses the SELR statistics in 
(2.7), (2.9) and (2.11) are asymptotically y 2 -distributed and their degrees 
of freedom are independent of the nuisance parameters such as A, G and 
the distribution of e. 

Theorem 3. Under Hog an d the conditions of Theorem 1, for ko > 1, 
we have r^lc ~ Xb n with 

2K*(0) _ (kp - 1 ) P \n\c K 

?A fK*{s) 2 ds ’ n h 

where K*(s ) is defined in (3.1), ck = K*(0) 2 / f K*(s) 2 ds. For ko = 1, we 
have vkIg = o p (l). 

Remark 1. If K{t) has support [—1,1], and if K(t) and \t\K(t) are 
concave on t£ [—1,1], then by the same argument used in the Sherman 
inequality [see Farrell (1985), page 343], we have 

|iF*(s)| < J K(t)K(s + t) dt + pf 1 J\t\K(t)\s+ t\K(s+ t) dt 
<K*( 0). 

Thus when K*(s ) >0,sG [—1,1], rx > 2. In particular, when K is the uni¬ 
form kernel function, rx = 2.8176 and cx = 1.0566; when K is the Epanech- 
nikov kernel function, rx = 2.5154 and cx = 1.2936. 

The next theorem presents the asymptotic null distribution of l(Ho s \G). 

Theorem 4. Suppose that the conditions of Theorem 1 hold. Then un¬ 
der Hos, rxl(Hos\G ) ~ xt* is linear or nh 9 / 2 —► 0; and under Hq u , if 

nh 9,/2 —> 0, thenrxl{Hou\G) ^ xl* where b* n = p\UL\cx / h andb* n2 = pi\Ul\cx/h 

u n2 

with cx and rx defined in Theorem 3 andpi being the dimensionality of A\o 
in (2.8). 

When nh 9 i 2 = 0(1), it is easily proved as in Fan, Zhang and Zhang (2001) 
that under Hq u the Wilks phenomenon continues to hold in the generalized 
sense that the mean and variance of the SELR statistic are independent 
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of the nuisance parameters to the first order. As pointed out in Section 2, 
when Aq in Hq s is known (or more generally in a parametric form), we can 
make a simple transformation (or use some bias reduction technique) to kill 
the bias. Theorems 3 and 4 indicate that the SELR statistics continue to 
apply to the case where the distribution of the stochastic error e is com¬ 
pletely unknown and, furthermore, there are many nuisance parameters in 
null hypotheses (see Section 3.4). In particular, the stochastic errors are al¬ 
lowed to be heteroscedastic and unknown. This is a useful generalization 
of the results in Fan, Zhang and Zhang (2001) where the distribution of e 
is essentially known. In particular, if the variance is heteroscedastic with 
var(e| U) = a 2 (U), they have to rely on the knowledge of u 2 (-) to construct 
the likelihood ratio statistics. This drawback is repaired by the empirical 
likelihood ratio method, while their Wilks phenomenon is inherited. 


3.3. Asymptotic power. To demonstrate the effectiveness of the sieve em¬ 
pirical likelihood method, we consider, for simplicity, the test statistic for 
the problem (2.12) under the contiguous alternative A n (-) —> 0, with A^(-) 
being bounded; that is, we allow the coefficient functions to be close to 
the null hypothesis, but still in the class of functions with bounded and 
continuous second derivatives. This is a much weaker restriction than the 
contiguous alternatives of the form A n (u ) = a n Bo(u ) for a sequence a n — > 0 
and a given Bq, considered by many authors [e.g., Eubank and Hart (1992), 
Eubank and LaRiccia (1992), Hart (1997) and Inglot and Ledwina (1996)]. 
The latter implicitly assumes that A' n (u ) —> 0 and A'^(u) —> 0, which are too 
restrictive for nonparametric applications. 

We begin with the following notation. Let 


(3.5) 

(3.6) 

(3.7) 

(3.8) 


W in = - J2 K h(Ui - UkMejV-^Uk) 


i^k 


x xrv- L (U k )X k A(U k ) T X k 


dG(e k ) 
de ' 


dG( £i 

de 


-E 


dG(si) 


de 


Ui 


W 2n = - J2 K h(Ui - U k )'E T i V- l {U i )'E k A(U i ) 1 


i^k 


x XiXTV- L (U k )X k XlA(U k ), 


WL = -Y, K *h(.Ui - U k )ZJV-\U k )E 

n — 


i^k 


\dG(e k ) 

Th 

de 

Vk 


xA{U i )X i XTT-\U k )X k Xj t A{U k ). 
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Then, following the same arguments used in Fan, Zhang and Zhang (2001), 
we can derive the asymptotic power 1(Hq s \G) via the next theorem. 

Theorem 5. Assume that Aq = 0 and that the underlying coefficient 
A = A n has twice continuous derivatives and satisfies nhEA{U) T XX T A{U) = 
0(1), max„ ||R(u)|| —> 0 and max u ||zF / (u)|| =0(1) as n —*• oo. Assume that 
G is twice continuously differentiable. Then under the conditions of Theorem 

1 , 

2l(H 0s \G) 

= ® iT (0) + nE{D{U) T V~ 1 (TJ)D(U)A{U) T XX T A{U)}(1 + o(l)) 
h 

n 

-Aj-E{D{U) T C{U)D{U)A"{U) T XX T A"{U )} 
x J J t 2 (s + t) 2 K(t)K(s + f)(l + nf 1 t(s +1)) dtds (1 + o(l)) 

+ (i + opih^yyr* + 2 w? n + w 2 * n + 2w 3 *j + ^(/r 1 / 2 ), 

where D, V, C and K* are defined in Section 3.1. 

Using the above result, similar to that in Fan, Zhang and Zhang (2001), 
it can easily be shown that under Hq s the SELR can detect the alternative 
with rate n -4 / 9 when h = c*n -2,/9 for some constant c*. This rate is optimal 
in the ordinary nonparametric regression setting. Note that the above result 
continues to hold for the composite null hypothesis testing problem (2.13) 
when Ao is a set of linear functions. 

3.4. Remarks on practical implementations. There are a couple of issues 
arising from practical implementations of the procedure, including comput¬ 
ing P-values, choice of bandwidths, choice of the support of U and bias 
reduction. We now briefly discuss them. 

P-values depend on the null distributions of test statistics. The conver¬ 
gence of the null distributions of the SELR statistics is expected to be slow. 
Thus, we do not suggest using the asymptotic null distributions. Instead, we 
use simulation methods (a form of bootstrap). Thanks to Theorems 3 and 
4, we can simulate the null distributions by fixing nuisance parameters or 
functions under the null hypothesis at certain values of interest. This will 
give better approximations to the null distributions. We have conducted an 
intensive simulation study in Section 4. The results show that for a sample 
size of 200 or more, the approach gives very reasonable approximations of 
the null distribution. 
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The SELR test depends on the choice of bandwidth h. It can be regarded 
as a family of test statistics indexed by the bandwidth h. A thorough discus¬ 
sion of this subject is beyond the scope of this study. Inspired by the adaptive 
Neyman test in Fan (1996), which has been demonstrated to be adaptive 
minimax by Fan and Huang (2001), one can possibly use the following cri¬ 
terion to choose a bandwidth: For some constants a, b > 0, a bandwidth 
h £ [n~ a ,n^ b ] is selected to give a maximum value of 

r 0 l(H 0s \G) - d n (h) 

\/2 d n (h) 

where 7*0 is the normalizing constant and d n (h ) is the degrees of freedom 
(see Theorem 4). This results in a multi-scale test: 

r 0 l(H 0s \G) - d n {h) r 0 l(H 0s \G) - d n {h) 

- - -= max - -. 

V 2 d n (h) h£[n- a ,n- b ] y/2d n (h) 

Such an idea was proposed in Fan, Zhang and Zhang [(2001), page 175] and 
in Horowitz and Spokoiny (2002) for the median regression problem and 
was shown to possess the adaptive optimal rate of convergence [Horowitz 
and Spokoiny (2002)]. It has also been studied and implemented by Zhang 
(2003). In many empirical applications, the bandwidths used for nonpara- 
metric function estimation have also been frequently employed for nonpara- 
metric hypothesis testing. The difference between the optimal bandwidth 
C^nA 1 / 5 ) for function estimation and 0(n~ 2 / 9 ) for hypothesis testing is 
hardly noticeable for practical sample sizes. 

When U has an unbounded support, we can not estimate the coefficient 
functions A(-) at the tails with reasonably good accuracy. In other words, 
we do not have enough data to test on the form of the coefficient functions 
at the tails. Due to this limitation, a reduced problem needs to be consid¬ 
ered: test on the form of the coefficient functions A(-) on a given interval. 
Our procedures continue to apply and |D| becomes the length of the given 
interval. 

When Aq in (2.10) is of parametric form A(-,$) and is nonlinear, the local 
linear estimate will be biased even under the null hypothesis. The bias is 
killed by requiring nh 9 / 2 —» 0 in the second part of Theorem 4. This is an 
unrealistic assumption, as pointed out by a referee. However, as discussed 
in Section 2, we should employ a bias reduction technique before applying 
the SELR test. Let 6 be a root-n consistent estimator under the parametric 
model. The error of parametric fit is usually negligible in nonparametric 
applications. By regarding A(-, 6) as Aq in (2.10), we can deduce the problem 
to (2.12). For problem (2.12), the local linear fit does not have any bias under 
the null hypothesis. Hence, the condition n/i“ 9 / 2 —► 0 is not required to kill 
bias. The bias reduction is also helpful in reducing approximation errors of 
the null distribution. 
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In summary, for practical implementations, the following steps are rec¬ 
ommended: 

1. Apply the bias correction method as in the last paragraph. 

2 . Choose an interval where functions are to be tested. This is the set fi. 

3. Choose an appropriate bandwidth, using the methods suggested above 
to construct a SELR. 

4. Apply the bootstrap method above to obtain a null distribution of the 
test statistic. 


4. Simulation. In this section the performance of the SELR test is eval¬ 
uated for a simplified conditional regression model by simulation. In this 
study, several bandwidths (i.e., h = con~ 2 / 9 , with co = 1 and 1.5 for the 
sample size n = 100; with co = 0.5,1,1.5 and 2 for n = 200 and 400; with 
co = 0.55,1,1.5 and 2 for n = 800; and with co = 0.2,0.35,0.55 and 2 for 
n = 1600) are used to represent widely varying degrees of smoothness. Due 
to space limitation, only part of the results is presented. The triweight func¬ 
tion (1 — t 2 )+ is selected as the kernel function in the proposed test. 

For simplicity of exposition and computation, we take the simple model, 

Y = a\{U) + e, 

where E[e|t/] =0 [i.e., G = e in (1.4)] and U is uniformly distributed over 
[0,1], though the results hold for more general varying-coefficient models. 
Consider the problem of nonparametric testing of significance: 

Hq : ai(-) = 0. 


The SELR test of Ho can be expressed as 

n n 

l(Qos\G)=J2J2w h (Ui,U j )\og(l + a(U j ,0) T G lh (U J ,0)), 
i=lj=l 


where a(Uj, 0) satisfies 


i =1 


_ Gjh(Uj, 0 )_ 

1 + a n{Uj, 0)Gj7j(L(j, 0) 


= 0 


with 

G ih (Uj,0) = Yj x (l,^^) T . 

Note that Theorems 3 and 4 imply that the null distribution of Z(@o s |G) is 
asymptotically independent of the underlying distribution of e. So without 
loss of generality, we assume that given U the stochastic error follows a 
normal distribution. 
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To examine the effect of the possible heteroscedasticity of e on the above 
null distributions, the conditional variance of e is taken to have the form 
(1 + cif/ 2 ), where the constant c\ represents the noise level. By generating 
100 independent samples of (Y, U ) with sample size n, we calculate the null 
distributions for several quite different values of ci, which represent widely 
varying degrees of heteroscedasticity of e. This results in 100 i.i.d. simulated 
values of Z(@os|G) for each combination of n and ci. The corresponding 
sample means and variances of Z(@o s |G0 summarize the distributions of the 
test statistics under the null hypothesis and are reported in Table 1. They 
do not strongly depend on the choice of the constant c\. As an illustration, 
the resulting 24 empirical distributions from the cases n = 400 and n = 800 
are depicted in Figure 1. Clearly they are very close when ci is varying 
from 0 to 10 5 for each case of ( n,h ). As expected, they should depend 
on the bandwidth h. This suggests that the asymptotic null distribution 
of Z(0o s |G) is not very sensitive to the heteroscedasticity of the stochastic 
error. To check whether the scaled SELR statistics follow asymptotically 
the ^-distribution, we equate the mean and variance of the scaled SELR, 
rol(@os\G), to the corresponding mean and variance of a chi-squared random 
variable, say Xd Q) with degrees of freedom do- This results in tq = 2/r/a 2 and 
do = 2 h 2 /<7 2 with fi and a 2 the simulated mean and variance of /(©o s |G). 
We calculated further the empirical distribution of the scaled SELR and 
compared it with the Xd 0 -distribution for each combination of ( n,h ). Since 
the empirical distributions do not depend sensitively on the conditional vari¬ 
ance function, only one of them was used for comparison. As an example, 
Figure 2 depicts the two distributions for the case that 

(n, h) = (800,1.5 x 800" 2/9 ) and a = 1. 

They are indeed very close. This demonstrates empirically the accuracy of 
the approximation of the null distribution of the proposed SELR statistic 
by using the ^-distribution. We also conducted a similar simulation study 
for testing homogeneity: 

Ho P ■ ®i (■) — 0- 

It again shows that the Wilks phenomenon continues to hold for some com¬ 
posite null hypothesis testing problem. The details are not reported here. 

To conclude this section, the power functions of the proposed test of Ho 
are estimated and compared to the commonly used F-type test statistic, 

F 0s = (RSS0 - RSS1)/RSS1 

[see, e.g., Fan, Zhang and Zhang (2001), page 155 for the definition], based 
on 100 simulations for the sample sizes n = 200,800 under two sequences of 
alternatives indexed by r. One is 

(4.1) Hi: a\(u) = r{u — 0.5), 


r = 0.1,0.2,..., 
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Table 1 

Summary of simulation results, p and a are the simulated 
mean and standard deviation of the SELR statistic based on 
100 repetitions, n is the sample size and h is the bandwidth 


n 

h 


a 


a 


a 




Conditional variances 




1 

1 + An? 

1 + lOiF 

100 

0.35938 

1.868 

1.221 

2.161 

1.788 

1.954 

1.252 

100 

0.53907 

1.754 

1.480 

1.767 

1.642 

1.646 

1.219 




Conditional variances 




1 

1 + 10m^ 

1 + 100 u* 

200 

0.15404 

4.371 

2.495 

4.393 

2.468 

3.973 

2.334 

200 

0.30808 

2.463 

1.527 

2.263 

1.421 

2.215 

1.413 

200 

0.46212 

1.655 

1.124 

1.698 

1.130 

1.329 

0.840 

200 

0.61616 

1.376 

1.081 

1.519 

1.242 

1.395 

0.976 

400 

0.13205 

5.019 

2.019 

4.487 

1.977 

4.459 

1.968 

400 

0.26410 

3.081 

1.720 

2.681 

1.361 

2.965 

1.433 

400 

0.39615 

2.007 

1.271 

1.961 

1.246 

2.192 

1.307 

400 

0.52820 

1.867 

1.492 

1.622 

1.126 

1.743 

1.501 




Conditional variances 




1 

1+u 2 

1 + 10°M^ 

800 

0.12452 

5.080 

1.774 

4.950 

1.611 

4.807 

1.557 

800 

0.22640 

3.092 

1.354 

3.191 

1.457 

3.093 

1.455 

800 

0.33959 

2.220 

1.171 

2.103 

1.165 

2.038 

1.080 

800 

0.45279 

1.785 

1.078 

1.627 

1.069 

1.699 

1.069 


and the other is 

(4.2) Hi: ai(u) = r(2sin 2 (27nx) — 1), r = 0.1,0.2,.... 


Table 2 

Empirical sizes of SELR and F-type tests. The probabilities are computed based on 100 
simulations; h = n~ 2 ^ 9 for n = 200 and h = 1.5 n~ 2 ^ 9 for n = 800 


Conditional variances Conditional variances 


n 

C r 

1 

1 + u A l + 10u^ l + 100u^ 
Sizes of SELR test 

C r 

1 

1 + u* 1 + 10 u“ 1 + lOOu' 1 
Sizes of _F-type test 

200 

5.20 

0.05 

0.05 

0.07 

0.07 

0.0705 

0.05 

0.05 

0.09 

0.09 

200 

4.47 

0.08 

0.09 

0.09 

0.09 

0.0579 

0.09 

0.11 

0.13 

0.15 

200 

3.16 

0.22 

0.25 

0.25 

0.24 

0.0375 

0.25 

0.28 

0.34 

0.35 

800 

5.11 

0.02 

0.02 

0.02 

0.01 

0.0134 

0.02 

0.05 

0.09 

0.09 

800 

4.59 

0.04 

0.03 

0.03 

0.02 

0.0132 

0.03 

0.05 

0.09 

0.09 

800 

3.65 

0.09 

0.09 

0.09 

0.08 

0.0109 

0.09 

0.10 

0.12 

0.17 

800 

2.81 

0.20 

0.21 

0.21 

0.19 

0.00776 

0.20 

0.22 

0.27 

0.29 
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n=400 n=400 




left triplet: h=0.13205; right triplet: h=0.2641 


left triplet: h=0.39615; right triplet: h=0.5282 


n=800 


n=800 




left triplet: h=0.12452; right triplet: h=0.2264 


left triplet: h=0.33959; right triplet: h=0.45279 


Fig. 1. Comparisons of the empirical null distributions of the SELR statistics based on 
100 simulations for different conditional variance functions. The solid curve, dotted curve, 
and dashed curve correspond to the conditional variances: 1, 1 + 10u 2 and 1 + 10 2 u 2 , 
respectively, when n = 400; and to the conditional variances: 1, 1 + u 2 and 1 + 10 5 u 2 , 
respectively, when n = 800. 


Here we take 1 + citt 2 as the conditional variance of the stochastic error given 
U = u with ci = 0,1,10,10 2 . Note that the powers of the SELR and E-type 
test statistics have the same optimal rate n~ 2 / 9 . Thus, in this study for sim¬ 
plicity we select the bandwidth by comparing several empirically specified 
bandwidths. We find that the combinations of h = n -2 / 9 and n = 200 and 
h = 1.5 x n -2 / 9 and n = 800 give relatively reasonable power functions for 
the two alternative sequences (4.1) and (4.2). For critical values given in Ta¬ 
ble 2, the sizes of the SELR and E-test are reported. It is evident that the 
sizes of the SELR test are adaptive automatically to the conditional variance 
function, while those of the E-type of test are not. This is consistent with 
our theoretical results and reflects one advantage of the SELR test. 
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Empirical and Hypothesized chisquare CDFs 



Fig. 2. The solid stairstep curve is the empirical null distribution of the scaled sieve 
empirical likelihood ratio statistic for n = 800, h = 1.5 x 800~ 2,/9 , ci = 1 based on 100 rep¬ 
etitions, and the dashed curve is the chi-squared distribution with 6.51 degrees of freedom. 


Figures 3-5 present power functions at the significance levels shown in 
Table 2. We have conducted simulations on much more different settings and 
these are not reported to save space. As expected, the power deteriorates as 
the level of noise ci increases for both the SELR and F-type tests. Figures 3 
and 5 indicate that the SELR test may significantly out-perform the F-type 
test in terms of power under the alternative 

Hi: ai(u) = r{u — 0.5) 

when there is heteroscedasticity. Similarly, Figure 4 implies that when the 
level of heteroscedasticity is low, the F-type test can have better power than 
the SELR test, and can perform much worse than the SELR test when the 
level of noise (heteroscedasticity) is high. This phenomenon can be explained 
by using Theorem 5. For the simple model Y = a\(U) +e, Theorem 5 gives 

2l{H 0s \G) = |iT(0) + nE{ ai (U)a~\U)} 

+ o(l) + (1 + o p (h 1/2 )){T* + 2W* n }. 


(4.3) 






SIEVE EMPIRICAL LIKELIHOOD RATIO 


23 


Var=1 


O 



0.2 0.4 0.6 0.8 

r 

Var=1+10u A 2 



Var=1+u A 2 



Var=1+100u A 2 



Fig. 3. Comparisons of the power functions of the SELR (solid curves) and F-type tests 
(dashed curves) of Ho : ffli(w) = 0 for n = 200 and the bandwidth h = 200~ 2 ^ 9 , evaluated at 
the alternatives (4.1) for different conditional variance functions. 


Note that if the function cr(U) is known, we can make the transformation 

Y' = Y/a(JJ) + e/a{U) 

and obtain the same asymptotic expansion as in (4.3) for the SELR based on 
the above transformed model. This means that (4.3) is adaptive to cr~ 2 (U) in 
the sense that we can test Hq s : ai(-) =0 asymptotically equally well whether 
or not we know the conditional variance of e. In contrast, the E-type test 
does not have this property. 

5. Technical conditions and proofs. 

5.1. Technical conditions. Define 

1 n 

An(uo. T) — 'z Kh (Uj Uo)Gih(uo, (3), 
n r—( 

i=i 
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Fig. 4. Comparisons of the power functions of the SELR (solid curves) and F-type tests 
(dashed curves) of Ho : ai ( u) = 0 for n = 200 and the bandwidth h = 200~ 2 ' 9 , evaluated at 
the alternatives (4.2) for different conditional variance functions. 


Z n (uo,f3) = max ||Gj h (it 0 , 

l<j<n 


i 


V n (uo,P) = — ^2 K h (Ui ~ uo)G ih (u 0 ,P)Gl h (u 0 ,P), 

n r—f 

i=i 

T /- ^ 1 ^ ^ , TT .. ,G ih (uo,P)Gl h (uo,(3) 

Vn(Uo,Q', (3) — / v &h\Uj U 0J T f>i / n\ 5 

1 + a T Gj^(?xo,p) 

D ^ 1 ^ K h (Ui-u 0 ) 9G ifc («o,/9) 

£>n{ u 0i a i p) — / , . Tf ~, / ax ) 

™ “J 1 + a T Gih(uo, (3) d(3 T 

ri ( K h (Ui-u 0 ) <9G jh (-u 0 ,/?) m 

C n (u°,a,/3) n E( 1 + a r G . h ( Uo> ^))2 5/3 r «G ift (« 0 , /?), 
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Var=1 


Var=1+u A 2 



Fig. 5. Comparisons of the power functions of the SELR (solid curves) and F-type 
tests (dashed curves) of Ho :ai(u) = 0 for n = 800 and the bandwidth h = 1.5 x 800 -2 ^ 9 , 
evaluated at the alternatives (4.1) for different conditional variance functions. 


D n (uo,a,P) 


1 " K h (Uj — up) d 2 G ih (uo,P) 
l + a T G ih (u 0 ,/3) d(3d(3 T 


E n (u 0l a,(3 ) 


If K h (Uj — up) dG ih (u 0 ,j9) dG ih (uo,0) T 

n (1 + u T G ih (u 0 ,P)) 2 d(3 T a df3 


Here and hereafter the norm of a matrix W = ( Wij ) is defined by ||W|| = 
y/Y aj w ij■ Let ro denote an arbitrary positive constant. Let ©o be a compact 
subset of R 2p such that (3q is an inner point of ©o- Define 


To = {K{(- - u 0 )/h)I{G h (u 0 ,(3) T ip >5}:u 0 en, \\(5 - /3 0 \\ < r 0 , 

IIV’II = 1)0 < <5 < 1} 












26 


J. FAN AND J. ZHANG 


where /{•} is the indicator function, 


T\ = {K((- - u 0 )/h)G h (u 0 ,P): u 0 G ||/3 - A)|| < r 0 }, 

E 2 = {K{(- - u 0 )/h)G h (uo,/3)Gl(uo,f3)\:u 0 <G Q, || f3 - /3 0 || < r 0 }, 

Es = {^(0 ~ u 0 )/h) dGh ^ ,P) :u 0 en,/3G0 o }. 

Let P n denote the empirical distribution of {([/*, Xj, T])}, and N(5, L\(P n ), Fj), 
j = 0,1,2,3, the covering numbers [see, e.g., Pollard (1984), page 25 for the 
definition]. We impose the following technical conditions: 


(KO) 

(UO) 

(Al) 

(A2) 


(A3) 


(A4) 


(A5) 


(A6) 


K has support [—1,1] and max^ K(t) < oo. 

The density of U is Lipschitz continuous and bounded away from 
zero. 

E[G{e)\U] = 0 and e is independent of X given U. 

There exist a constant £ > 4 and a function F(Y,X ) satisfying 

sup || G(Y - f3 T Z(X, t)) || || Z(X, t) || < F(Y, X ), 

|t|<i 

||/3-/3o||<5 0 

supE[F(Y,X)Z\U = u] <oo. 

U 

For 1 < k < ko, 

sup E[Gl{Y-p T Z(X 1 t))\\Z(X,t)\\ 2 \U = u 0 + th\ =0(1). 
\\/3-Po\\<ro 

u 0 £fl,\t\<l 

There exist co(P n ) and some positive constants Co and wo such that 
Ec 0 (P n ) —> c 0 and 

NfaL^P^Fo) <c 0 {P n ){hS)- w °. 

There exist ci(P n ) and some positive constants c\ and w\ such that 
Eci(P n ) —> ci and 

N(5,Li(P n ),Fi) < ci(P n )(hd)~ Wl . 


Uniformly for uq £ 11, \\t\\ < 1 
U -u 0 


E\ g(y - p T z(x,- 


h 


-> 0 and h —> 0, 

U = u 0 + th ] j=O(h 2 ) + O(\\P-f3 0 \\). 


There exist C 2 (P n ) and a positive constant C 2 such that Ec 2 (P n ) —> C 2 
and 


N(S,L 1 (P n ),F 2 ) < C2(P n )(h5)~ w F 
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(A7) sup ||/3 _ /3o|| < r0iUoeQi|t| < 1J B[G|(y-^Z(A, i ))||Z(X,t)|| 4 |7/ = uo + ^] = 
0 ( 1 ). 

(A8) Uniformly for \\/3 — /Soil ^ 0 and h—* 0, 


e{g[y- p T z(x, u h u ° ^j ) g t (V- (3 T z(yX, u h u ° ^j ^|C/| 
= V(uq) + 0(h 2 ) + 0(\\(3-, 


(A9) V(uo) and r(no) defined in Section 3.1 are Lipschitz continuous in 
no £ 0. Their minimum eigenvalues are uniformly positive in iio £ U. 
(A10) For any p > 0, there exists a constant c(p) > 0 such that when h is 
small enough, 


inf \\EK h (U - u 0 )G h (u 0l (3)\\ >c(p). 

P60O 
Id—/3o||>P 

For a positive sequence p n \ —> 0 and a small enough constant p 2 , as 
n —> oo, 

inf ||^(U - u 0 )G h (u 0 ,P)\\ > p n i + 0(h 2 ). 

Pnl<||d-do|l<P2 

(Bl) There exist a constant n > 2 and a function F^{Y,X) such that 


sup E[FZ(Y,X)\U = u] < oo, 


sup 

«o,d 


dG h (u 0 ,l3) 


dp 


(B2) For a constant c, 


/(|0 —n 0 |</i) <F 4 {Y,X). 


(B3) Uniformly for no £ 0 and ||/3 — /?o|| < r n = o(h 1//2 ), 

- ^o) gG ^ r ° ,/3) } = O(n 0 ) ® (5 ® r(n 0 )) + o^ 2 ). 

(B4) sup||0_ /3o ||< ro>11oe n > |t|< 1 Fl[||aG /l (no,/3)/0/3 T || 2 |C/ = n 0 + t/i] < oo. 
(B5) There exists a function F$(y, x) such that 

supE[F£(Y,X)\Y = u] < oo, 

U 

d 2 G h (up, (3) 
d[3d[3 T 


sup 

11/3—/5o || <fo 


I(\U-uo\<h)<F s (Y,X). 
















28 


J. FAN AND J. ZHANG 


(B6) There exists a function Fq such that sup u E[Fq(e,X)\\X\\ 2 \U = u] < 
oo, and that for |U — uq\ < h and 


* h 2 A »T, ^X(U-u 0 ) 2 

£ —S+—A (uo + s(U — Uo)) - ^2 - 


+ {P -Po) T z(x, 


U -up 
h 


we have 


dG{e*) 

de 


<F 6 (s,X) 


uniformly for [s| < 1, \\/3 — /3q|| < i"o and uq E 11. 


We would like to make some comments on the conditions above. Suppress¬ 
ing dependence on X, we denote Z(t) = Z(X,t). Suppose for some ro > 0 
there exist integrable functions Fj(Y,X),j = 1,2,3, such that 


sup K'(t)\\G(Y-FZ{t))\\\\Z{t)\\ <Ti(Y,X), 
11/3—/3o[|<ro,t 


sup K(t) 
||/5-^o||<ro,i 


dG{Y - P T Z{t)) 
de 


\\ z (t)\\(\\Z'(t)\\ + \\Z(t)\\) < F 2 [Y,X), 


sup K{t)\\G{Y-(3 r Zm\\\Z'{t)\\ <F 3 (Y,X). 
\\/3-Po\\<ro,t 


Then for some positive constant c, 

K Gh {Ul ’ Pl ) " K G h (“2, fo) 

< c{Fi(y,A~) + F 2 (Y,X) + F 3 (y,X)}{ | ui ~ Ua| + ||/3! - foil j. 

Thus the second part of condition (A4) holds if EFj(Y,X ) < oo, j = 1,2,3. 
Similar remarks can be made about conditions (A6) and (B2). 

As pointed out in Section 2, EK^U — uo)Gh(uo, Pa) = 0, «o E Q, can be 
viewed as certain local estimating equations associated with the equations 
E[G(Y — A(U) T X)\U = no] = 0,«o £ as A{u) is expanded around each uq. 
In this sense, the first part of (A10) implies that when Pa (coefficients of 
the approximation of A) is away from the true value Pq (coefficients of 
the approximation of Ao), \\EKh(U — uo)Gh(uo, P)\\ is away from 0. This 
is a little stronger than the requirement that E[G(Y — A T {U)X)\U} = 0 
if and only if A is equal to the true value. The second part of (A10) is 
a local condition which says locally \\EKh(U — uo)Gh(uo, P)\\ is bounded 
below by the norm of the linear function of P near the true value Pq. For 
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instance, assume the first component of G is Y — A T (U)X and assume that 
E[XX T \U = it] is positive definite uniformly in it. Then we have 


\\EK h (U-u 0 )G h (uo,(3 A )\\ 


> 


EK h (U-u 0 ) 


Y-(3\Z X, 


U - n 0 
h 


Z X, 


u -u 0 
h 


= o(h 2 ) + (p 0 - PaY 

x JK(t)E[Z(X,t)Z T (X,t)\U = u 0 + th]f(u 0 + th)dt 
> c||A) ~ Pa\\ + 0(h 2 ), 

provided h is small enough. 

5.2. Proofs. Note that Lemmas 1-8 are used in this section and their 
proofs can be found in the Appendix. 

Proof of Theorem 1. First of all, using Lemma 3, we obtain 
P(uq) — Po = Opfi 1 / 2 A n -1 ^), d(ito) = o p (h ll/2 A mT 1 ^). 
Furthermore, by the definition of a (= d(ito)) and P (= /3(uq)), we have 


0 = -YK h (U t -u 0 ) 


n 


i 


i=i 


0 = -J2Kh(Ui-u 0 ) 

n “ 


Glj/j (no, /3) 

1 + b r Gj/j(uo ; /3) 

d T dG ih (u 0 ,P)/df3 T 


i =i 1 + d T G ? ;/ J (uo,/3) 

Then invoking the Taylor expansion we have 

0 — A n (n 0 , A)) kn (no, Clnl , Pnl)A 

T (n o, Clnl, Pnl) (no, o^ii, /3n,l)} (/3 A)), 

0 = {S rl (?Xo, Ofn2, PnP) C n {uo, Ot n 2i Pn2)}o! 

T {L?n (no, 2, Pn2) E n {uQ, Ol n 2, Pn2)}(P A)), 

where a n j, j = 1,2, are between a and 0 and /3 n j, j = 1,2, are between 
/3 and A)- By using Lemmas 4-8, the above equations become 

~A n {u 0 , Po) = -(1 + o p (h 1/2 ))V (n 0 ) ® (-S' <g> r(it 0 ))d 

+ {o p (h 1/2 ) + D(uq) <g> (S' <g> r(n 0 ))}(/3 - Po), 

0 = {o p (h l/2 ) + D(u 0 ) <8> (S <g> r(n Q ))}d + o p (h 1/2 )0 - Po). 
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It follows that 

($ ~ A)) = -[{Diu^yV -1 (uq)D(uq)Y 1 D{u 0 )V~ l Oo) 

® {S~ l <g>r(n 0 ) _1 ) + o p (h 1/2 )\A n (u 0 ,p 0 ), 
a = [I/" 1 (n 0 ) - V~ 1 (u 0 )(D T (u 0 )V~ 1 (u 0 )D(u 0 )y 1 

x D(u 0 )D T (u 0 )V~ 1 (u 0 )+ o p (h 1/2 )\A n (u 0 ,/3 0 ). 
Observe that for U* = uq + s{Ui — uo), 0 < s < 1, and for 

e* = y- A T {Ui)Xi + ^A'yuyxyUi - u 0 ) 2 
+ (P-Po yz(xi,^^j 

we have 

-i 71 / -jrj \ 

A n (u 0 ,P ) = -J^K h (U i -uo)G(et)<8>z(x i ,-i^) 

-I 77- / TT \ 

= - E K h(Ui - u 0 )G(e i ) (8) Z (x,, 

+ ~2^p( i) + Op(\\P ~ M), 

where the last equality follows from the condition (B6) (or A is linear). Now 
the proof can be completed by some simple calculations. □ 


Proof of Theorem 2. Note that under the conditions of Theorem 1 
we have h —> 0 and nb?! 2 —> oo. Recall that given U, £ and X are independent 
by condition (Al). By the Taylor expansion and Lemma 4 there are matrices 
V*(Uj ) such that as n—> oo, uniformly in Uj , 

K(^) = m) ® (S' ® r(i/j))(i + o p (h 1/2 )), 

1 n 

mo=- Uj)G ih {Uj,p). 

Tl . 1 

The last two equalities lead to 


((G) = i>(Gjr £ Kh v!m Ul) ,„ G.i.(0,/3) 


i=i 


^E”=i^(^-C00 




j=i 






SIEVE EMPIRICAL LIKELIHOOD RATIO 


31 


(5.1) 



£™=i K h (Um-Uj) 


V*{Uj 


-^V n (U j ,s*a(U j J))^a(U j ) 

1 n 

= -(1 + o p (h^)) E rHUjWUjYiviUj) ®(s® r Mmuj), 

3 = 1 

where 0 < s* < 1, and V n (u, a, (3 ) is defined in Section 5.1. Note that we draw 
out the factor 1 + o p (/i 1 / 2 ) from the inside of the summation in (5.1) because 
the o p (/i 1 / 2 ) is uniform with respect to Uj , l<j<n, and a(Uj) T [V(Uj) <8> 
(S' ®Y(Uj))\a(Uj)/a.{JJj) T a{Uj), 1 < j < n, are bounded away from 0 and oo 
[see condition (A9)]. It follows from the definition of C(u) in Section 3.1 that 
C(u)V(u)C(u) = C(u). Thus, combining (5.1) and Theorem 1, we obtain 


/1 \ " i " 

1(G) = i- + o p (h 1 / 2 ) j E ~Y. K h(Ui - U j )(C(U j )G(si)) T 

' ' j=l i=l 

( r 

E U 2 - 1 (U i -U j )T- 1 (U j )X i /h 
-[V(U j )®{S®T(U j ))} 


(5.2) 


f(Uj ) 

1 n 

x - E K h(U k ~ Uj)(C(U^G^k)) 
T~\Uj)X k 


k =1 


= (l + o p (h 1/2 )) 


^{Uk-U^T-^U^Xk/h 


+ Cn 


1 


n n n 


x^EEE ^(l/i - Uj)K h (Uk - U j )f~ 1 (Uj 

zn *=ifc=ij=i 


x G^)C(tL)G( £A; ) 


x (1 + ^2 


^(Ui-Uj^Uk-Uj) 


h 2 


xXTT-H^fc + Cn, 

where Cn = 0 when Ao is linear, and otherwise Cn = O p {nh A ). The last term 
in (5.2) can be decomposed as follows: 

(5.3) (1 + o p (h 1 G))L(G) = T n u + T n 121 + T n i22 + T n2 i + T n 22 + Cn 


where 

-t n n 

T n ii = — EE KhVi-Ujfr'iUj) 

Tl 1 

2=1 J = 1 


x (G T (e i )C7(l7 i )G(e i ) - i?[G r fe)C'(^)G , (e*)l(^ ) ^)]} 
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x (1 + ^2 


-i m-Ujf 


h 2 


XTT-'iU^Xi 


T nl2 i = ^Y,J2 K h(Ui-U j ) 2 E[G T (£ i )C(U j )G(£ i )\(U i ,U J )\ 


i= 1 3 = 1 


x 1 + ^2 


-i (Ui-Uj) 2 


h 2 


x {XlT~ l (Uj)Xi - J E[X-r- i (C/ i )X i |(C/ i ,C/ j )]}r i (C/ i ), 


Tn 122 = ^EE^i - ^■) 2 ^[G T (£ i )C(^)G(e i )l(^,^)] 


*=i j=i 


x 1 + ^2 


-i (Ui-Uj? 


h 2 


x E[XTT- l QUj)Xi I([/,, Uj)\ r 1 ([/,■), 


T n2 i = ^E E R h( u i - Uj)K h (U k - C/,-)G' T (£i)C'(C/,)G(£ fe ) 




X 1 + 


h 


-i 






i^k 


u k - Ui 


+ K 


h 

Ui-Uk 

h 


G T (ei)C {Uj)G{£ k )XJT~ 1 {Ui)X k f- L (JJi) 


G T {£i)C{U k )G{£ k )XlT-^U k )X k f-\U k ) . 


Observe that as nh 3 ^ 2 —> oo, h—>0, 


Tn 122 = ^ 2 ptr(C{U i )V(U i ))pf- 2 (U i ) 


- ^■) 2 tr(C(C/ i )y(C/ i )) 


(5.4) 




X (1 + ^2 




if(0) 2 f r tr(C(C/)F(C/)) 

n/i 2 1 f 2 {U) 


h 2 


triT-^UjnummfiUj)) 


-l 


+ OJn" 1 / 2 ) + 


= ^(/T 1 /*) + ^ 
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where 

= \Y, K h{Ui - Uj) tl(C(Uj)V (Ui)) 
n Vti 

x (i+ ^ 1( %" [/j ) 2 ) trir-'iujnummfiuj))- 1 

= ——j K 2 (t)(l +/j,2 1 t 2 )dt +Opfi- 1 / 2 ). 

This is because 

E^n = (1 + 0(h))£ JK(t) 2 {\ + ^H 2 ) dtE{tv(C(U)V(U))f-\U)} 
= p{k °~ 1 \ i + 0(h))\n\ I K 2 (t)(i + ^H 2 )dt, 

Var('I' n ) < O^^hT 2 ) = o(/i -1 ). 


By a similar argument, we have the following equalities: 
K{ 0) 2 n 


T n 121 = 


i nh ) 2 “I 


J2^cmvm) 


x (XTT-^U^Xi - E[XTT- l (Ui)Xi\Ui])f- l (Ui) 


tt»—1/ 


p-1/ 


( 5 - 5 ) + - U i? toiC{Uj)V{Uj)) (1 + h 2 -1 {Ui U])2 

Tl . / . \ 

*¥=3 


/l 2 


K{ 0) 2 
nh 2 


O p (n _1/2 ) + o p (/i^ 1/2 ), 


(5.6) T n 22 — o p (h 1 ^ 2 ), 

(5.7) T n21 = o^/i" 1 / 2 ) + ^ £ GT(e i )^ifchG ; (e fc ), 

n 

where <&ikh is defined in (3.2) and the last equality follows from Hoeffding’s 
decomposition for the variance of [/-statistics. Now (5.3)-(5.7) imply (3.3). 
Equation (3.4) can be proved by a similar argument by showing that 

l{Ao\G) = (1 + o p (h 1/2 )) 

1 n 

X ^E A'niUjiPoWiUj) ® (S ® r([/ i ))]- 1 7l n ([/ i , Po). 

3 =1 


The proof is complete. □ 
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Proof of Theorem 3. Invoking the asymptotic representations in 
Theorem 2, we need only to prove the asymptotic normality of T n . To this 
end, we first calculate the variance of T n , 

Var(T n ) = ^W)£{G T ( £1 )$ 12/1 G( £2 )}2 

= {2 n ^ tr {E[^ 12h G(e 2 )G T (e 2 W 123h G(e 1 )G T (s 1 )]} 

= 2(1 / + ° i^)) tT {E[Kl(U 2 - U 1 ) 2 C(U 1 )G(e 2 )G T (e 2 )C(U 1 )G(£ 2 ) 

n(n — 1) 

xG{e 2 yx T l T- l (U 1 )X 2 X T 2 Y- 1 (U 1 )X l ]} 
= 2(1 + O(ft))p(fc 0 - 1* j K , (tfdt 
n[n — 1) h J 

(5.8) 

Let Di = 1 < i < n, and life be the u-algebra generated by 

Di,...,D k , 1 <k< n. Set $ h (Di,D k ) = G T (£i)$ ikh G(e k ), r] nl = 0 and 

Vnk = E[T n \u k ] - £[r n |n fe _i]. 

Then 

2 

Vnk = —, -T T^2^h(Dj,D k ), 2 <k<n, 

n{n - 1) ^ 

and {r] nk ,Il k } is a sequence of martingale differences. By Theorem 4 of 
Shiryayev [(1996), page 543], it suffices to show 

n 

(5.9) Var -1 (T n ) ^ E[rfc k \U k _i] —> 1 in probability 

k =2 

and 

n 

(5.10) Var- 2 (r n )^^ fc ^0. 

k =1 

In the following, D = (e,X,U) denotes a general random variable indepen¬ 
dent of Di and D k . To prove (5.9) and (5.10), we need the following equalities 
for i < j: 

E[$ h {Di, Dj) 2 \Di] 

= \j K*(t) 2 dtXfr -1 (Ui)XiG T (ei)C (Ui)G(si)(l + 0(h)), 

Ei&hiDuD^hiDjtDMD^Dj)] 

= G(£i) T E[K h (U - Ui)K h (U - UkMUdViUMUj) 

xtviF- 1 {U^XX^iU^XiXj^D^D^Giej), 
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E®h( D uD)$l(D j ,D) 

= ^(1 + 0(h)) J j K*(t) 2 K*(s) 2 dtds 

x J E[(Xjr“ 1 (^)^) 2 (G T ( £j )C(C/ J )G(e i )) 2 ], 

E®h( D i,D k ) 


= 0(l)(l + 0(h))± J K*{tfdt. 


These are obvious by the assumption that e and X are independent given 
U. Now with the above equalities, we can derive 

n 

£-%nfcl n fc-l] 


k =2 


= £ 


'k -1 


%n 2 (n- l) 2 


E [® h (Dj,D k ) 2 \Dj 


k -1 


+ J2El®h(Di,D k ) +/,, (£>,, L» fc ) | ( A , Dj )] 


4 g 1 + O(h) 


£n 2 (n-D- h 


J I\*(t) 2 dt 


(5.11) x X J T r- i ([/ i )X i G T (e,)C(t/ i )G(e i ) 

n g n —1 

k =2 ' ' i<j 

_ (l + 0(h))4/ K*{t) 2 dt 


n 2 {n — l) 2 h 


n—1 


x £> - *)X?T- 1 (i7 i )X i C? T (e i )C , (Z7 i )G(e i ) + T* 


Z=1 


= (l + o(l)) 


2 jK*(t) 2 dt 


where 
T — 

n — 


n(n — 1) 

x E{E[XlT-\U i )XmE[G^e i )C{U i )G{em\} + ^ 
= (l + o(l))Var(T n ) + T n , 


8 


n 2 (n — l ) 2 
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n—1 

x j) G ( £ i) T 

i<j 

x E[K* h (U - Ui)K* h (U - U k )C(Ui)V(U)C(U k ) 

xtr{T~\U k )XX T T-\U i )X i X T k )\(y il U k ,X i ,X k )\G{e k ). 


Note that 


E[ ?n} 2 


64 

n 4 (n — l) 4 

n —1 

x J2( n ~ k ) 2 

i<k 


x £{G( £i ) r £K([/ - Ui)K* h (U - U k )C(Ui)V(U)C(U k ) 
xtr(r~ 1 (U k )XX T 

xr-\U i )X i XZ)\(U i ,X i ,U k ,X k )\G{e k )} 2 

= o(4) v ar ( r„), 


which implies T n = o p (Var(T n )), and where K*(t) * K*(t ) is the convolution 
of K*(t ) with itself. Substituting the above equality into (5.11), we get (5.9). 
Analogously, (5.10) follows from the following calculations: 


5Z E Vnk = 


0 ( 1 ) 


n k —1 


k =2 


n 4 (n-l) 4 ^ 2 ^. 




= 0(Var(T n ) 2 ) 


(n — l ) 2 


0(n) + O 


k — 1 


The proof is complete. □ 


Proof of Theorem 4. The first part is similar to the proof of The¬ 
orem 3. The details are omitted. To show the second part, we recall that 
r(it 0 ) = E[XX T \U = uo]f(uo) and write 

** = (*£!). r =(r” r“) “d r u , 2 = r„-r 12 r s 'r 2I 

where X]X is pi -dimensional, Tn, r 2 i, r 22 are p\ x pi, p\ x p 2l P 2 x p\ 
and P 2 x p 2 matrices and P 2 = P ~ Pi- Following the same steps in the proof 
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of Theorem 3, we first extend Theorem 1 as follows: 



x 77;Oo)(l + o p (h 1/2 )) + O p (h 2 ), 


a*(u 0 ) = -J2 K h(Ui~u 0 ) 

n r—f 


1 x 2 - 


i —1 


X 



- V~ l D(D T V~ l D )- 1 D T V- l G{£i) 

/ n \ ' 



x (1 + o p (h 1/2 )) + O p (h 2 ). 


Then by using the decomposition formula in Fan, Zhang and Zhang (2001) 
we have 


XlTiUk^Xk 

= {Af )T - xf )r T 22 {U k )T 2 l {U k )}T^l 2 {U k ){X^ - T 12 (U k )T£(U k )X®} 



The remaining part is very similar to the proof of Theorem 3. The details 
are omitted. □ 

Proof of Theorem 5. The argument is similar to that in Fan, Zhang 
and Zhang (2001) but more tedious. For simplicity, we derive it heuristically. 
Write 


l(H 0 s \G) = (l + Op{h 1 ' 2 )) 


1 


n n n 


EEE K h (Ui - Uj)K h (U k - Uj) 


i =1 fc=lj=l 


G T (si + A(Ui) T Xi)(e k + A(U k ) T X k ) 


(5.12) 


x 
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«('^w 

x XTT-\Uj)X k - l G 

= (1 + o p {h l ' 2 ))(W n0 + W n i + W n2 + Wna) - la 


with 


n n n 


= 2^5 E E E - D) 


i=l /c—1 j=l 


+ 1 


h h J f(Uj) 
x G(e i ) r ^" 1 (^)G(e fc )^r- 1 (C/ i )X fc , 


n n n 


Wnl = ^ E E E - Uj) 


i= 1 fc=l j=l 


X ^1 + n 2 


_i Ui-UjUk-UA 1 


h h ) f(Uj) 


xGieiYV-^^^-XTT-^XkXiAiUk), 


n n n 


"'"“yEEE K *( u t ~ Uj)K k (U t - Uj) 


i= 1 fc=l j=l 


x| 1 + tf l g ‘- g * tr *- g A ^- 


h 


h J f(Uj) 


x ^^ v - 1 {U J )G{e k )XlT-\U j )X k XlA{U k ) ) 


n n n 


Wn3 = ^ E E £ - D)*ft(E4 - Uj) 


1=1 k =1 j=l 


xll + ^^^A-L 




x «r' Pj , 


& / nuj) 


de 
dG(e* k ) 
de 


A(Ui) T XiXlT -1 (Uj)X k X k A(Uk), 


x 
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where e* is between e, and £* + A(Ui) T Xi and e* k is between e k and e k + 
A(U k ) T X k . Under some regularity conditions, 

n n ( n 

G(£,) t <| E K h(Ui - Uj)K h (U k - Uj) 


2=1 Al= 1 


Lj=l 


x 1 + ^2 


1 


1 Uj-U j U k -Uj 


h h 


x 


x A(C2 fc ) 


,5G(£| 

de 


^In 


+ o P (/i 1/2 ), 


2 ' pv 
W n2 = W nl , 

where W^ n is defined in (3.5). Similarly, we write 

W n 3 = W n 31 + 21U n 32 + Wn33 

where, when EA(U) T XX T A(U) = O(X-), 


1 


n n n 


Wnn = K ^ u < - Uj)K h (U k - Uj) 

i= 1 k =1 j =1 


x 1 + ^2 


-1 Ui Uj U k Uj\ 1 x 


-V~ (Uj) 


h h ) f(Uj)' v 
x ~ k A(U i ) r X i XJT- 1 (Uj)X k X T k A(U k ) 


1 


= ^EE-I KKUi-U^V- 1 ^) 

i= 1 fc=l 


x H i A(C2 i ) T X i X?T- 1 ([2 i )X fc ^A(l7 fc ) + o p (/U 1/2 ) 

-°(^) + 2 t + ^ 1/2) ' 


1 


n n n 


w n32 = — 2 EE-IE^ (Ui - - Uj) 

^ ^ 2=1 A:=l j = 1 


x [^1 + H 2 
x £ 


-1 Ui Uj U k Uj ^ 1 


pG(4) 

Th 

de 

V k 


h h ) f(Uj ) 
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w. 


= ^ + o P (h-^), 


^=^EE e 


i =1 k =1 


dG(e*) 


de 


Ui 


E K h (Ui — Uj)K h (U k — Uj) 


j= 1 


/ -M-UjUk-Uj} 
[ 1+ ^ 2 ~h 


pG(4) 

Th 

de 

V k 


x m v ~ 1{u > )E 


x {u^xiAiUk) 


— O r 


nh 2 


+> E 


<9G(e) 


de 

x (1 + o(l)), 


t/ 


V~ l {U)E 


dG{e) 


de 


U 


A{U) T XX T A{U) 


with E,; defined in (3.6). Recall that lR| n and R 7 ^ are in (3.7) and (3.8), 
respectively. Observe that as EA{U) T XX T A(U) =0(^) we have 

2 n n 

"w-sEEar KKUi-uyv-yUi) 

i =1 /c—1 

x S i A(l7 i ) T JfiJf?T- 1 (t7 i )X fc ^A(C4) + o^/T 1 / 2 ) 

-°(d?) + ^ + ^" 1/2 )' 

W n32 = W^ n /2 + o p (h~ 1 / 2 ), 


W n3:i = O, 


nh 2 


+ I E < E 


<9G(e) 


de 

x (l + o(l)). 
Similarly we have 

^(G) = (l + o p (/ i - 1 / 2 )) 


E7 


V~ l (U)E 


dG(e) 


de 


U 


A{U) T XX T A{U) 


1 

2^ 


n n n 


EEE** (c/i - £^(£4 - ££; 


i=l fc=1 jf=1 


(5.13) 
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xi l +rt a, - v ‘ Uk - Ul 


h 


h 


1 


m) 


G T {e i )C{U j )G(£ k ) + 2S nl + S n2 


where 


1 


n n n 


S ”' = tfEEE ~ Uj)K h (U k - Uj) 

t= 1 k =1 j=l 

—! ^ ~ Uj Uk ~ Uj 


x i + h 2 


h 


h 


f(Uj 


x G r (eOC(C/. 


x ^G{ £ k) tt \2 




-A"(U*) T (U k - UjYX k , 


= O p (n(nh) 1 h 2 ) 
= 0 P (/i) 


and 


1 


n n n 


^ = ^EEE 

zn i=l fc= l 1=1 


x |^1 + /r 2 


-iUi-UjUk-UA i 




■C(^) 


h 


Wj 


dG ^k) A'\U j )X i XTT- l (U j ) 


nh 4 


de J de 

x X k x;A"(Uj)(Ui - Uj) 2 (U k - Uj f 

E{D T (U)C(U)D(U)A"(U) T XX T A"{U)} 


x JJ t 2 (s + t) 2 K{t)K(s + t)(l + // 2 H(s + t)) dt ds(l + o p (l)), 

where UJ is between U k and Uj. Now the desired result follows from (5.12) 
and (5.13). This proves the theorem. □ 

APPENDIX 

Lemma 1. Under conditions (KO), (UO), (A2)-(A4), if there exist some 
positive constants bo,bi and r) < 1/2 such that bo < hn v < b\, then there 
exists a sequence of positive constants d n —> 0 such that 


An(uo,P) = e{k h (U - u 0 )g(y - f3 T z(x, 

+ o p {n~ 1 ^ A h 1 / 2 )d n . 


U -u 0 
h 


zfx, 


u - u 0 
h 
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Furthermore, if condition (A5) holds and rj > l/(2£), then uniformly in \\/3 — 
Poll <r n = o(n~ 1// £)d n , 

A n {u,Q, fd) — o p (n ^)d n . 

Proof. For any positive constant M n , we can write 
A n (u 0 ,p) = EK h {U - uq)g(y - (d T z(x, ®z(x, 

+ A n \(uq , (3) + A n 2 (uq,P), 

where 

1 n 

A nl (u 0 ,P) = -Y J K h (U i - u 0 )G lh (u 0 ,P)I(F(Y il X i ) < M n ) 
n , 

i=i 

- EK h (U - u 0 )G h (u 0 , (d)I(F(Y, X) < M n ) 

and 

1 n 

A n2 (u 0 ,(d) = -Y J K h (U i - u 0 )G ih (u 0 ,P)I(F(Y i ,X i ) > M n ) 

n “ 

i=i 

- EK h (U - u 0 )G h (u 0 ,P)I(F(Y,X) > M n ). 


Note that 

E\\A n2 (u 0 ,P)\\<2EK h (U-u 0 )G h (u 0 ,P)I(F(Y,X)>M n )<cM^. 

(A.l) 

Consider the following empirical processes: 

n 

Vn{g) = n- 1 ' 2 ^g(Yi, X i} uo, (d) - Eg(Y, X, u 0 ,0)), 

i=l 

geF n = {M~ 1 g:geFi}, 

where T\ is defined as in Section 5.1. It follows directly from assumption 
(A4) that 

N(S, Li (Pn),X n ) < Cl( P n ){h5M n )~ w1 . 

Obviously, by condition (A3), for g = K((- — uo)/h)G(uo, (3) E F n , 
E\\g(Y,X,u 0 ,P)\\ 2 

< chM~ 2 sup E Y]u=U0+th {G 2 (Y - FZ(X,t))\\Z(X,t)\\ 2 } 

uo,t,\\p-!3 0 \\<ro 

< 0(hM~ 2 ) = v. 
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Now let M n = n s °, 8 n = ( h 1//2 A n 1 ^)(logn) 1 and M = 8 n n 1 ^ 2 hMQM n 1 . 
Using Lemma 2 in Zhang and Gijbels (2003), we have 


P{supp n i('u 0 ,/3)||5 ri 1 > Mq} 


= P\ sup |K(s)|| > M> 

n ' 

< ci(n 1//2 (MhM n )~ 1 ) Wl exp{— c^M 2 /v} + C 2 V~ W1 exp(— nv) 
= 0((h 2 8 n )- w ') exp{-c 3 8 2 n nh 2 M 2 M- 2 /hM~ 2 } 

+ C20(hM~ 2 )~ W1 exp(-C 4 n/iM“ 2 ). 


The last terms in (A.l) and (A.2) are o(8 n ) and o(l), respectively, if 

bo<hri n <bi, nh 2 /logn —■» oo, n 1 ~ 2 ^h/logn —■» oo, 

nhM~ 2 /log n —► oo, M~^ +1 8~ l —> 0. 

The above requirements are fulfilled provided that, for sq > 0, 


b 0 < hn 71 < bi , 
V 


,1 2 

0 < r? < min< —, 1 — 


max 


1 


.2(£-i)’£(£-i) 

These conditions are equivalent to 


< s 0 < 


2’~ O’ 
1 — 77 


(1 2 1 

< hn ^ < 61 , 0 < r) < mini —,1 —-,1 —1 — 


12 


e’ e’ m-i) 


1 

2 


since £ > 4. 

Let d n = (logn ) -1 and / be the density of U. Now we can complete the 
proof if we note that for \\/3 — /3o|| < o(n~ l ^)d n and 77 > l/( 2 £), we have 



K h (U - u 0 )EK h (U - uq)g(y - P T z(x, 

= f A(f){i?[G( 


U -u 0 
h 


Z X, 


Y — /3 T Z( X, 


U -Uq 
h 


U = uo + th 


< E[Z(X, t)\U = uq + th]f(uo + th) dt 


l - 4*)} 


= 0{h 2 ) + 0(||/3 -p 0 \\) 
= 0(n ~ 2v ) + o(ra _1 ^d n ) 
= o{n~ l/ ^d n ) 


by using condition (A5). □ 
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Lemma 2. Under conditions (KO), (UO), (A2), (A 6 ) and (A7), as n— > 
oo, b o < hn v <b±, 0 < rj < 1/2, we have 

= L^([/ - M o)G A (« 0l /3)q(tto,/3) + o p (^ 1/2 ) 

= U(u 0 ) ® (5 ® r(«o)) + o P (/i 1/2 ) + 0(11/3 - /3o||). 

Proof. The proof is similar to that of Lemma 1 and is thus omitted. 

□ 


Lemma 3. Under conditions (KO), (UO), (Al)-(AIO) and (Bl), ifbo< 
tin 11 < 61 , l/( 2 £) < rj < 1 / 2 , then there exists a sequence of positive constants 
d n —> 0 such that as n — > 00 , 

$(uo) = Po(uo) + o p (ri _1/? A h l/2 )d n , 
a n (u 0 ,/3(tt 0 )) = o p (n -1/? A h 1/2 ). 


Proof. First of all, by Lemma 1, there exists a sequence of positive 
constants d n —» 0 such that 

(A.3) A n (u 0l f 3 0 ) = o p (n _1/? A h 1/2 )d n . 

Note that condition (A2) implies 

(A.4) Z n (u 0 , /3 ) = Op(n 1/? ) 

uniformly in uq G 0 and 11 /5 — /3 q 11 < ro- Set the function 


9n(a,P) 


- t.KUUi-uo)- G "‘ ( “ 0 '' 3) 

n -, 1 

1=1 


+ ot T Gih(uo, (3) 


Then following the argument of Owen (1990) and using conditions (KO), 
(UO), (Al), (A4), (A5), (A 8 ), (A9) and (Bl), we can show that for large n, 
a n (uo,/3) exists and satisfies the equation 


(A.5) g n (a n (u 0 ,/3),/3) = 0 

when H/3 — /?o 11 < ro and ro is small. To see this, we first note that for constant 
5 > 0 small enough, we have 


inf f K(t)E[I{ip T (G(£) (g> Z(X,t)) > 5}\U = uo\dt> 5, 

IMI=i J 

uoEQ 

which yields 


(A. 6 ) 


inf / K(t)E[I{if T G h (u 0 ,0) > 5/2}\U = u 0 + th] 

11 - 011=1 J 

\\P-Po\\<ro 


x f{uQ + th) dt > 5/2 
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as h —> 0 and ro is small enough. This is the main consequence of conditions 
(Al) and (A9). Define 

1 n 

H n (J3,ip) = -J2 w h(Ui,u 0 )I{G ih (u 0 ,f5) T ilj > (5}. 
n T^i 

Then under conditions (A1)-(A4), (A8)-(A10), using (A.6) and the strong 
convergence of empirical processes [Pollard (1984) and van der Vaart and 
Wellner (1996)], we can show that there exists 6 > 0 such that for small 
ro and large n, infii,M| H n (P,ijj) > 5 almost surely. This shows that 0 is 
contained in the convex hull of the points in {Gih(uo, 0 ): Wh(Ui, uq) > 0, 1 < 
i < n|. Now (A.5) follows directly from the Lagrange multiplier method as 
in Owen (1990). 

Let 


a-n(u 0 ,/3o) = pv with p= ||a n (w 0 ,/3 0 )|| and ||u|| = 1. 

We have 


0 = \\9n{a n {uo,Po),Po)\\ 

= \\9n(pv,Po)\\ 

> \\v T g n (pv,/3 0 )\\ 

i n r 

= - v T J2K h (Ui-u 0 )\ G ih (u 0 ,p 0 ) 

n { 


pGjh(up, Po)G ih (uo, f3o) T v \ 

1 + pv* T G ih (u 0 ,l3 0 ) j 


>-pJ2 K h(Ui-u 0 ) 


n 


i =1 


v T G ih (u 0 , p 0 )G th (u 0 , (3 0 ) T v 
1 + pv* T G ih (u 0 ,(3o) 


v T A n (u 0 ,/3o)\ 


> P- 


V T V n (uo,(3o)v 


~ Pn(w 0 ,/3o)||, 


1 + pZ n (u 0 , fio) 

where v* = tv with 0 < t < 1. Thus, combining (A.4) with (A.3), Lemma 2 
and condition (A9), we have 


< _ PnC» 0 ,A))|| _ 

P ~ v T Vn(u 0 ,(3o)v - \\A n (uo,{3o)\\Z n (uo,po) 

= Op(\\A n (uo,Po)\\) 

= Op(n _1/? A h 1/2 )d n , 

that is, 

(A.7) an(uo, fo) = o p (n _i/? A h 1/2 )d n . 

Set ( j> n = (h 1 / 2 A n~ 1 ^)d n , and let u(uq,@) satisfy 

u(uo,P)\\E{K h {U-u 0 )G h (uo,P)}\\=E{K h (U-u 0 )G h {u 0 ,p)}. 
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Define 


ln{u 0 ,(3 ) = — V K h {Ui - u 0 )log(l+a n (u 0 ,/3) T G ih (u 0 ,l3)), 
n 


2—1 


Tn(uo,/3) = -J2 K h(Ui - u 0 )log(l + (j) n u(uo,P) T G ih (u 0 ,(3)), 

n — 


2—1 


T n i(u 0 ,f3) = — ^2K h (Ui - M 0 )log(l + 4> n u(uo,(3) T G ih (u 0 ,/3)) 


n 


2—1 


x ^(l|G,7 t || < n 1 ^). 

We have 

(A.8) 0>l n (u o ,f3 o ) > -an{uo,P)A n (u 0 ,(3) = 0p (^), 

and uniformly for uq and (3 , 

1 n 

T nl (u 0 ,f3) = (j)n-J2 K h (Ui ~ u 0 )u(u 0 ,(3) T G ih (u 0 ,P)I{\\G ih \\ < n 1/? ) 
n 7^i 

1 1 n 

- -4> 2 n \O(l)\-Y / K h (Ui~u 0 )F(Y i ,X l ) 2 

2 n ti 

= (j) n {u(u 0 , P) T E[K h (U - u 0 )G h (uo , /?)]} + o p (4>l) + O p (4>l). 

Note that for fixed uq and /3, the function — ^ X^Li Kh{Ui — rto)log(l 
a T Gj/j(uo,/?)) attains the minimum at a n (uo,/3). This implies l n (uo,/3) 
—T n (uo,/3). Consequently, for any p > 0, by (A.8) we have 

P(\\P(u Q ) - Poll > P ) 


< P[ sup l n (u 0 ,(3) > l n (u 0 ,P o) for some u 0 ) 
V||/3-/3 0 ||>p J 

<p[ sup l n (uo,P) >-\Op(4>D\ for some u 0 J 
V||/3-/3 0 ||>p / 

<p( sup (-T n (u 0 ,(3)) > -\Op(4> 2 n )\ for some 
V||/3-/3 0 ||>p / 

<p( sup (-T nl (u 0 ,(3)) > -\O p (4>l)\ for some 

V||/3— fln||>P / 


+ p(supZ n (u 0 ,f3) > n 1/? 

\u 0 ,f3 


<p{ inf \\EK h (U-u 0 )G h (u 0 ,(3)\\ < \O p {(f) n )\ for some u 0 | + o(l) 
U\P-M>p J 


+ VI 
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-0, 

where the last limit follows from condition (A10). Therefore using for p n \ —» 
0 and p 2 in condition (A10), as n—> oo, we have 

P(\\$(u 0 ) - Pq || > p n i for some u 0 ) 

= P{P 2 > \\P(uo) - A)II > Pm for some u 0 ) + o(l) 

<p( inf \\EK h (U - u 0 )G h {uo,P)\\ < \O p (p n )\ for some uq) 

\P2>||/3-/3o||>Pnl / 

+ 0 ( 1 ) 

+ P{Pnl + 0{h 2 ) < \Op((j) n )\) + o(l), 
which leads to 

P(u 0 ) - Po = O p ((l) n ) = o p (n' 1/? A h 1/2 )d n . 

Invoking the argument of Owen (1990) and Lemma 1 again, we have 
a n (u 0 ,P(u 0 )) = o p (n _1/5 A h 1/2 ) 
uniformly in uq. This completes the proof. □ 

Lemma 4. Suppose for some positive constants bo and b \, bo < hn v < bi, 
0 < rj < 1/2. Then under conditions (KO), (UO), (A2), (A6), (A7) and (A9), 
as n—+ oo, we have 

V n (u 0 ,a,P) =V(u 0 ) <g> (S <g> T(u 0 ))(l + o p {h l/2 )) 
uniformly for uo E 12, ||a|| + \\P — /?o|| < o(n~ 1 ^ A h 1 / 2 ). 

Proof. Note that under condition (A2) we have 

sup Z n (u 0 ,P) = o p (n 1 +), 

\\P-Po\\<ro 

which together with Lemma 2 yields 

V n (u 0 ,a,P ) = V n (u 0 ,P) + O p (||a||) f - Y. K h(Ui - u 0 )F(Y i ,X i ) 3 

(! + °p( 

= V n (u 0 ,if 2 ) + Op(\\a\\) 

= V{uo) F(u 0 )) + o p (h 1/2 ) + Op(\\a\\). 


The proof is complete. □ 
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Lemma 5. Suppose there exist positive constants bo, b\ and p such that 
bo < hn 71 <b±, 0 < 77 < 1/2. Then under conditions (KO), (UO), (A2), (Bl)- 
(B4), as 00 , 

B n (u 0 ,a, 0) = D(u 0 ) < 8 > (S <g> T(u 0 ))(l + o p (h 1/2 )) 
uniformly for uq G 12, ||a|| + ||/3 — /Soil < o(n~ 1 ^ A h 1 / 2 ). 

The proof is similar to that of Lemma 1 and thus is omitted. 

Lemma 6. Under conditions (KO), (UO), (A2), (Bl), as h —* 0, nh —> 00 , 

C n (u 0l a,P) = O p (||a||) 

uniformly for uo G 12, ||a|| + ||/3 — /3o|| < o(n -1 ^ A h 1 / 2 ). 

Proof. Note that by condition (A2) and ||-0i|| < o(n _1// £ A h 1 / 2 ), we 
have 


max sup ||a T Gj^(uo,/3)|| = o p ( 1 ). 

1 P,U 0 


Thus 


\\C n (uo,a,P)\\<O p (\\a\\)-J2 K h(U i -uo)F 4 (Y i ,Xi)F(Y i ,X i ) = O p (\\a\\ 

Tl . 1 
i=l 

by conditions (A2) and (Bl). The proof is complete. □ 

Lemma 7. Under conditions (KO), (UO) and (B5), as h —>0 and nh- 


00 , 


D n (u 0 ,a,f3 ) = O p (||a||) 

uniformly for uq G 0, ||a|| + \\/3 — /Soil < o(n -1 ^ A h 1 / 2 ). 

Lemma 8 . Under conditions (KO), (UO) and (B5), as h—> 0, nh—> 00 , 

E n (u 0 ,a,/3) = O p (||a|| 2 ) 

uniformly for uq G 12, ||a|| + H/3 — /So|| < o{n~ 1 ^ A h 1 / 2 ). 

The proofs of Lemmas 7 and 8 are similar to the proof of Lemma 6 and 
thus are omitted. 
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